SemanticScuttle - klotz.me » klotz: machine learning+large language models

klotz: machine learning* + large language models*

Bookmarks on this page are managed by an admin user.

New Trends in LLM Architecture This bookmark is certified by an admin user.

Discusses the trends in Large Language Models (LLMs) architecture, including the rise of more GPU, more weights, more tokens, energy-efficient implementations, the role of LLM routers, and the need for better evaluation metrics, faster fine-tuning, and self-tuning.

2024-06-01 Tags: llm, machine learning, deep learning, transformers, self-tuning, evaluation by klotz

Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach This bookmark is certified by an admin user.

This article discusses a method for automatically curating high-quality datasets for self-supervised pre-training of machine learning systems. The method involves successive and hierarchical applications of k-means on a large and diverse data repository to obtain clusters that distribute uniformly among data concepts, followed by a hierarchical, balanced sampling step from these clusters. The experiments on three different data domains show that features trained on the automatically curated datasets outperform those trained on uncurated data while being on par or better than ones trained on manually curated data.

2024-06-01 Tags: self-supervised learning, clustering, machine learning, k-means, feature training, llm by klotz

Contextual Transformer Embeddings Using Self-Attention Explained with Diagrams and Python Code This bookmark is certified by an admin user.

This article is part of a series titled ‘LLMs from Scratch’, a complete guide to understanding and building Large Language Models (LLMs). In this article, we discuss the self-attention mechanism and how it is used by transformers to create rich and context-aware transformer embeddings.

The Self-Attention mechanism is used to add context to learned embeddings, which are vectors representing each word in the input sequence. The process involves the following steps:

1. Learned Embeddings: These are the initial vector representations of words, learned during the training phase. The weights matrix, storing the learned embeddings, is stored in the first linear layer of the Transformer architecture.

2. Positional Encoding: This step adds positional information to the learned embeddings. Positional information helps the model understand the order of the words in the input sequence, as transformers process all words in parallel, and without this information, they would lose the order of the words.

3. Self-Attention: The core of the Self-Attention mechanism is to update the learned embeddings with context from the surrounding words in the input sequence. This mechanism determines which words provide context to other words, and this contextual information is used to produce the final contextualized embeddings.

2024-06-01 Tags: transformer, attention, self-attention, embeddings, nlp, deep learning, llm, machine learning by klotz

Exploring Google’s Latest AI Tools: A Beginner’s Guide This bookmark is certified by an admin user.

This article introduces Google's top AI applications, providing a guide on how to start using them, including Google Gemini, Google Cloud, TensorFlow, Experiments with Google, and AI Hub.

2024-05-29 Tags: llm, tools, google gemini, google cloud, tensorflow, vertex.ai by klotz

Scaling Monosemanticity: Anthropic’s One Step Towards Interpretable & Manipulable LLMs This bookmark is certified by an admin user.

An article discussing the concept of monosemanticity in LLMs (Language Learning Models) and how Anthropic is working on making them more controllable and safer through prompt and activation engineering.

2024-05-29 Tags: llm, neural networks, monosemanticity, polysemanticity, prompt engineering, anthropic by klotz

A Complete Guide to BERT with Code: History, Architecture, Pre-training, and Fine-tuning This bookmark is certified by an admin user.

In this article, we will explore various aspects of BERT, including the landscape at the time of its creation, a detailed breakdown of the model architecture, and writing a task-agnostic fine-tuning pipeline, which we demonstrated using sentiment analysis. Despite being one of the earliest LLMs, BERT has remained relevant even today, and continues to find applications in both research and industry.

2024-05-28 Tags: bert, llm, embedding, google, nlp, encoder-only, transformer by klotz

Training and Finetuning Embedding Models with Sentence Transformers v3 This bookmark is certified by an admin user.

This article explains how to use the Sentence Transformers library to finetune and train embedding models for a variety of applications, such as retrieval augmented generation, semantic search, and semantic textual similarity. It covers the training components, dataset format, loss function, training arguments, evaluators, and trainer.

2024-05-28 Tags: sentence transformers, finetune, embedding, models, similarity, llm, huggingface by klotz

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention This bookmark is certified by an admin user.

This paper introduces Cross-Layer Attention (CLA), an extension of Multi-Query Attention (MQA) and Grouped-Query Attention (GQA) for reducing the size of the key-value cache in transformer-based autoregressive large language models (LLMs). The authors demonstrate that CLA can reduce the cache size by another 2x while maintaining nearly the same accuracy as unmodified MQA, enabling inference with longer sequence lengths and larger batch sizes.

2024-05-26 Tags: transformer, autoregressive language models, key-value cache, attention, multiquery attention, cross-layer attention, machine learning, computer science, llm, mit, csail by klotz

Google launches ‘Model Explorer’, an open source tool for seamless AI model visualization and debugging This bookmark is certified by an admin user.

Google has launched Model Explorer, an open-source tool designed to help users navigate and understand complex neural networks. The tool aims to provide a hierarchical approach to AI model visualization, enabling smooth navigation even for massive models. Model Explorer has already proved valuable in the deployment of large models to resource-constrained platforms and is part of Google's broader ‘AI on the Edge’ initiative.

2024-05-20 Tags: google, llm, machine learning, visualization by klotz

ChatGPT Glossary: 44 AI Terms That Everyone Should Know This bookmark is certified by an admin user.

Stay informed about the latest artificial intelligence (AI) terminology with this comprehensive glossary. From algorithm and AI ethics to generative AI and overfitting, learn the essential AI terms that will help you sound smart over drinks or impress in a job interview.

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

klotz: machine learning* + large language models*

Linked Tags

Related Tags